Amiga Tools 2

home *** CD-ROM | disk | FTP | other *** search

/ Amiga Tools 2 / Amiga Tools 2.iso / tools / packer / lha / lhx / lha150r.lzh / AppInfo next >

Wrap

Text File | 1993-01-30 | 11KB | 364 lines

LhA V1.32 Application Info File structure and algorithms. By Stefan Boberg 1991,92 NB: This is an early version of the document, so it is not complete. Format of a LZH / LHA file -------------------------- LHA files have exactly the same file format and structure as LZH files, but LHA files generally are compressed with -lh5- compression, while LZH files generally are -lh1- compressed (see section about compression algo- rithms). Files can be stored in arbitrary order in the archive file. The overall file format is as follows: [file header] [file data] [file header] [file data] . . . [archive terminator] The file header is layout as follows: Case 1: (header level 0) Header size (in bytes) 1 byte Header checksum 1 byte Storage method 5 bytes Compressed size 4 bytes Original size 4 bytes Last mod file date & time 4 bytes File attributes 1 byte Header level [0] 1 byte Filename length 1 byte Filename & filenote variable size File CRC-16 2 bytes Case 2: (header level 1) Header size (in bytes) 1 byte Header checksum 1 byte Storage method 5 bytes Compressed size 4 bytes Original size 4 bytes Last mod file date & time 4 bytes File attributes 1 byte Header level [1] 1 byte Filename length 1 byte Filename & filenote variable size File CRC-16 2 bytes Host Operating System 1 byte Extension size 2 bytes Extension data variable size ... Extension terminator [0] 2 bytes Case 3: (header level 2) Header size (in bytes) 2 bytes Storage method 5 bytes Compressed size 4 bytes Original size 4 bytes Last mod file date & time (UNIX-Fmt) 4 bytes File attributes 1 byte Header level [1] 1 byte Filename length 1 byte Filename & filenote variable size File CRC-16 2 bytes Host Operating System 1 byte Extension size 2 bytes Extension data variable size ... Extension terminator [0] 2 bytes The compressed file data follows immediately after the last header byte. The archive terminator is a single 0 byte after the last data byte of the last file in the archive. Explanation of fields --------------------- All fields are encoded in Intel-format, i.e. 16-bit quantities are stored with the least significant byte first. 32-bit quantities are stored as two 16-bit Intel words with the least significant word first. Header size This unsigned byte contains the length of the header excluding the header checksum byte and the header size byte itself. With level-1 headers, the extended headers are NOT included in the header size count. (except for the first two-byte length word). With level-2 headers, this is a two-byte word field containing the length of the entire header including all extended headers. Header checksum This byte contains the modulo-256 checksum of the header, which is calculated as follows (pseudo-C): { unsigned byte header[]; unsigned byte length; unsigned byte checksum; checksum = 0; length = header[0]; /* Header size field */ while (length) { checksum += header[length + 2]; length--; } /* checksum now contains the checksum */ } Storage method This is a 5-byte ASCII char array containing the storage method ID. See the section about compression methods for a list of IDs. Compressed size Original size These 4-byte fields contains the size of the file in it's compressed and original state, respectively. Last file modification date & time The date and time is encoded in standard MS-DOS format. The 32-bit word is divided into bit fields like this: Bit 31 - 25 (Year - 1980) 21 - 24 Month [1..12] 16 - 20 Day [1..31] 11 - 15 Hour [0..23] 5 - 10 Minute [0..59] 0 - 4 Seconds/2 [0..29] With level-2 headers, things are a bit different. In this case the date is stored in UNIX-format. A UNIX timestamp is a 32-bit integer containing the number of seconds since January 1, 1970. File attributes This byte field contains the file attribute bits, the format depends on the host operating system. Header level This byte field is used to indicate what kind of header this is, it can currently be 0 (original LhArc format), 1 or 2 (Unix LHarc/LHA format). Filename length This field contains the length (in bytes) of the filename. Amiga LhArc/LhA stores filenotes in level-0 headers in the filename field. The filenote follows the null-terminated filename (the filename is not normally null-terminated). The length of the filenote and the null byte should be included in the filename length count. This way of storing the filenotes is compatible with all versions of LhArc, so Amiga LZH archives with filenotes can be processed on other platforms without problems. Filename & filenote This field contains the filename and (optional) filenote. File CRC-16 This field contains the CRC-16 of the source (uncompressed) file. It is used to check the integrity of the archive during extract and test operations. CRC --- The CRC is a standard ANSI 16-bit CRC. It is calculated as follows: (Pseudo-C) unsigned short calcCRC(unsigned char *buffer, unsigned int length) { unsigned short crc; unsigned int i; unsigned char c; crc = 0; i = 0; while(i < length;) { c = buffer[i++]; crc = crctable[(crc ^ (c)) & 0xFF] ^ (crc >> 8); } return(crc); } The CRC-table is built as follows: unsigned short crctable[256]; void make_crctable(void) { unsigned int i, j, r; for (i = 0; i < 256; i++) { r = i; for (j = 0; j < 8; j++) if (r & 1) r = (r >> 1) ^ 0xA001; else r >>= 1; crctable[i] = r; } } Extended headers ----------------- The `extended headers' are used in level-1 and level-2 headers to store optional or variably-sized information such as filenotes, operating-system specific attributes etc. The general structure of an extended header is: Length [2 bytes] (The length count includes the type, Type [1 byte] length and data fields, i.e. data Data [Length - 3 bytes] field length + 3 = Length) The extended-headers block is terminated by 2 zero bytes (zero length). The currently implemented headers are: Type ---- 0 Common header (Data = Header CRC16) 1 Filename header (Data = ASCII string of Filename, excluding directory names) 2 Dirname header (Data = ASCII string of Directory name, excluding trailing slash). Node delimiter is 0xFF (octal 0377, decimal 255) 0x40 Attribute header (Data = Two-byte word containing file attributes). This overrides the attribute field in the main header. 0x71 Filenote header (Data = ASCII string of filenote) Compression modes ----------------- Currently, a file can be stored in the archive in one of four ways; it can be STORED (not compressed) or FROZEN (compressed) in three different ways. The method ID's are listed in the table below: Method ID ------------------ Stored -lh0- Frozen -lh1- Frozen -lh4- Frozen -lh5- Directory -lhd- ------------------ I. STORED Compression: A stored file is not compressed. The file data should be copied directly from the source file to the archive, the CRC16 for the file must be calculated and stored in the header for data integrity check. Decompression: A stored file is not compressed. The file data can be copied directly from the archive to the destination, while calculating the CRC16 for the file. II. FROZEN (-lh1-) Compression: LZ77 with 4096 bytes window. Literals and copies encoded with dynamic order-0 Huffman codes. Distance codes encoded with fixed order-0 Huffman codes. [ No algorithm description in this early document version ] Decompression: [ No algorithm description in this early document version ] III. FROZEN (-lh4-) Compression: This method is exactly the same as -lh5-, but with a window size of 4096 characters. See the description of -lh5- for more info. Decompression: This method is exactly the same as -lh5- and can be decompressed with the same decompression routine, there is no difference between -lh5- and -lh4- from the decompressor's point of view. IV. FROZEN (-lh5-) Compression: LZ77 with 8192 bytes window. Literals and copies encoded with block-adaptive order-0 Huffman codes. Number of distance bits encoded with another set of block-adaptive Huffman codes. [ No algorithm description in this early document version ] Decompression: No buffer initialization required. [ No algorithm description in this early document version ] V. Directory (-lhd-) Compression: No Compression. Set the CRC-16 field to 0000. The directory name should include a trailing slash. (like in `dir1/dir2/', and not `dir1/dir2') Decompression: No Compression. Just create the directory whose name is in the filename field.